DMS: A Parallel Data Mining Server
نویسنده
چکیده
Tandem’s Data Mining Server (DMS) is a parallel data engine designed to enable data mining tools to store, access and analyse high volumes of data very efficiently. In contrast to traditional database management systems, data structures are optimized for analysis and pattern recognition, rather than for accessing individual rows. The data stored in DMS tables are encoded automatically, so the required disk space is typically 3 to 5 times less than the raw data size. This approach not only minimizes disk space and disk access time, it enables the majority of processing to be done in memory.
منابع مشابه
A Parallel Data Mining Architecture for Massive Data Sets
This paper discusses a parallel data mining architecture which provides the capability to mine massive data sets highly efficiently, scanning millions of rows of data per second. In this architecture the mining process is divided into two distinct components. A parallel server, Compaq’s Data Mining Server (DMS), provides a set of data mining primitives which are utilized by a data mining client...
متن کاملData Mining: a Database Perspective
Data mining on large databases has been a major concern in research community, due to the di culty of analyzing huge volumes of data using only traditional OLAP tools. This sort of process implies a lot of computational power, memory and disk I/O, which can only be provided by parallel computers. We present a discussion of how database technology can be integrated to data mining techniques. Fin...
متن کاملData Mining: a Database Perspective
Data mining on large databases has been a major concern in research community , due to the diiculty of analyzing huge volumes of data using only traditional OLAP tools. This sort of process implies a lot of computational power, memory and disk I/O, which can only be provided by parallel computers. We present a discussion of how database technology can be integrated to data mining techniques. Fi...
متن کاملA Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction
This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a ...
متن کاملData-Mining: A Tightly-Coupled Implementation on a Parallel Database Server
Due to the increasingly di culty of discovering patterns in real-world databases using only conventional OLAP tools, an automated process such as data mining is currently essential. As data mining over large data sets can take a prohibitive amount of time related to the computational complexity of the algorithms, parallel processing has often been used as a solution. However, when data does not...
متن کامل